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Abstract 

In this paper, we propose a method to apply the popular 
cascade classifier into face recognition to improve the com- 
putational efficiency while keeping high recognition rate. 
In large scale face recognition systems, because the proba- 
bility of feature templates coming from different subjects is 
very high, most of the matching pairs will be rejected by the 
early stages of the cascade. Therefore, the cascade can im- 
prove the matching speed significantly. On the other hand, 
using the nested structure of the cascade, we could drop 
some stages at the end of feature to reduce the memory and 
bandwidth usage in some resources intensive system while 
not sacrificing the performance too much. The cascade is 
learned by two steps. Firstly, some kind of prepared features 
are grouped into several nested stages. And then, the thresh- 
old of each stage is learned to achieve user defined verifi- 
cation rate (VR). In the paper, we take a landmark based 
Gabor+LDA face recognition system as baseline to illus- 
trate the process and advantages of the proposed method. 
However, the use of this method is very generic and not lim- 
ited in face recognition, which can be easily generalized 
to other biometrics as a post-processing module. Experi- 
ments on the FERET database show the good performance 
of our baseline and an experiment on a self-collected large 
scale database illustrates that the cascade can improve the 
matching speed significantly. 

1. Introduction 

Recently, face recognition technologies are applied in 
more and more applications with big data, such as, face 
recognition for large scale social network services [JJ, and 
face recognition system for surveillance. These large scale 
systems usually contain many millions of templates in the 
database, which are deployed centralized or distributed. 
And they need to deal with the requests from the users or 
other image sources {e.g.. Surveillance Camera) continu- 
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ously. Big data and high intensive request bring about many 
strict requirements to the efficiency of face recognition sys- 
tems. The efficiency of the systems are mainly determined 
by two factors: the size of feature template (Storage and 
Transmission) and the speed of template matching (Com- 
putation). In this paper, we will focus on these two factors 
and propose a generic and lightweight method for efficient 
large scale face recognition. 

The computational complexity of the most popular 
matching algorithms are linear with respect to the number 
of templates n and the dimension of feature d, such as Eu- 
clidean distance. Cosine similarity and so on. There are 
two kinds of methods to reduce the computation: approxi- 
mated nearest neighborhood methods |i26J and partial fea- 
ture based filtering |23 |. The details of these methods will 
be described in the next section. In this paper, mainly in- 
spired by [lOJ . we propose a novel fast matching method by 
applying the cascade classifier II2TII from face detection to 
face recognition. For face recognition, especially in large 
scale applications, the matching between the probe and the 
gallery set is an asymmetry classification problem, in which 
the majority of matching pairs are negative (two images 
with different identity). This is the case that the cascade 
classifier can work efficiently. 

While feature template has been grouped into several 
stages, we can drop several stages at the end of the cascade 
to obtain smaller feature template. Small template can obvi- 
ously save the storage space and improve the transmission 
speed of system, but it will also reduce the recognition rate. 
Therefore, we need to consider the trade-off between fea- 
ture length and performance for specific needs. The spirit 
is very similar to the Scalable Video Coding (SVC) |4| in 
H.264 video compression standard. SVC standardizes the 
encoding of a high-quality video bitstream that also con- 
tains one or more subset bitstreams. A subset video bit- 
stream is derived by dropping packets from the larger video 
to reduce the bandwidth required for the subset bitstream. 
The subset bitstream can represent a lower spatial resolu- 
tion, lower temporal resolution, or lower quality video sig- 
nal. Similarly, the cascaded feature template contains many 



subset feature bits, and the subset has lower recognition 
rate. 

In summary, the cascade can provide two advantages for 
large scale face recognition systems: (1) quickly reject neg- 
ative face pairs to improve the matching speed; (2) accord- 
ing to the specific storage and computation requirements, 
supply a scalable structure for user to select the size of fea- 
ture. To illustrate the advantages, we propose a EBGM- 
like 122] baseline system. First, a face detector and ASM 
are used to localize serval facial landmarks. Then, multi- 
scale and orientation Gabor |12| features are extracted on 
the landmarks. Finally, LDA is applied to enhance the dis- 
crimination and reduce the dimensionality of feature. The 
similarity of features are evaluated by Cosine metric. 

The main contributions of the paper are as follows: 

1. By studying the asymmetric property of large scale 
face recognition problem, we apply cascade classifier 
for fast face matching. As a generic and lightweight 
post-processing module, the cascade classifier can be 
used widely to improve the speed of other biometric 
systems; 

2. We propose a new concept "feature subset" for bio- 
metrics by cutting-off several stages of the cascade, 
and give the relationship between feature length and 
recognition rate by experiments; 

3. To illustrate the advantages of our method, we propose 
an EBGM-like |22| baseline face recognition method, 
the performance of which is comparable to many state- 
of-the-art methods lfT9ll25ll20l. 

2. Related Work 

Fast algorithm for large scale pattern recognition prob- 
lems has a long history in the field of image retrieval, but 
don't get much attention in face recognition community. 
The most popular fast algorithm for large scale image re- 
trieval is approximated nearest neighborhood, such as k-d 
tree M and Locality-Sensitive Hashing (LSH) Qlll. But 
||23]| has found these approximated nearest neighbor search 
methods do not work well with high-dimensional face fea- 
tures, that means their performance degrade quickly in face 
recognition with database increasing. 

In ll23l . Wu et al. proposed a multi -reference re-ranking 
approach for large scale face recognition. The main idea 
is originated from the query expansion techniques in text 
information retrieval. Firstly, many local features of face 
components are used to get a small candidate set from the 
large gallery. Then a binary global feature is used to re-rank 
the candidate set to get the final result. Experiments show 
the performance of their algorithm is comparable with the 
linear scan system using the state-of-the-art face feature. On 
a database containing one million face images, the speed-up 



ratio is about 8x comparing to the linear scan system. llT3l 
also used similar way to improve the speed of face recog- 
nition by sifting the gallery according to rank. Our method 
is closely related to these two methods, the idea is using 
partial or simple features to reject the irrelevant samples as 
quickly as possible. 

Inspired by the ideas of popular Hashing based meth- 
ods fT, "^l, Yan et al. Il26l proposed a Similarity Hashing 
(SH) for large scale face recognition, and got good perfor- 
mance on a database containing 100,000 face images. Al- 
though SH has achieved 30x speed-up ratio in the experi- 
ments, it is very memory consuming and need extra tens of 
Giga-Bytes to store the hash index for every samples in the 
gallery. Because our method exploits the asymmetric struc- 
ture of the data in large scale face recognition, comparing to 
the above methods, our cascade structure not only can ob- 
tain high speed-up ratio but also has the lowest performance 
loss. 

Cascade is a kind of extreme unbalanced tree to deal with 
asymmetric two-class problem. The most successful appU- 
cation of cascade is face detection fSTl, in which the cas- 
cade was used to reject non-face samples at each node (or 
stage) and nearly allow all face samples to pass. In ifTOl 
the cascade was used for face recognition, but its advantage 
was not noticed and analyzed in the context of large scale 
face recognition. In this paper, we take face matching as 
a two-class problem, then large face recognition problem 
is exactly an asymmetric classification problem. Given a 
probe, the majority of samples in gallery should have dif- 
ferent identity with the probe. Therefore, we borrow the 
cascade structure from face detection to large scale face 
recognition in this case. We will see that the cascade can 
also work well for face recognition. The process of cascade 
learning in face recognition is simpler than that in face de- 
tection, because we only need to learn the threshold of each 
stage but the strong classifier. 

3. Baseline Face Recognition Method 

To illustrate the concept and the advantages of the pro- 
posed method, we will discuss the steps and algorithms 
base on a baseline face recognition method. The baseline 
is similar with the popular EBGM and Bochum/USC sys- 
tem II22I [TSl . Firstly, we detect 76 facial landmarks on 
a face image, and normalize the face image according to 
some reference points {e.g., two eyes). Then Difference of 
Gaussian (DoG) filter llT9l is used for illumination normal- 
ization. Second, we extract multi-scale Gabor feature on 
the 76 landmarks. And then a holistic subspace learning is 
applied on the Gabor features to enhance the discriminative 
ability. The detail steps are described in the following sub- 
sections. 




Figure 1. Two face images in FERET database and their 76 facial 
landmarks localized by our ASM landmarker 

3.1. Face Normalization 

Because face detection is relative irrelevant to this paper, 
we just skip this step. After face detection, we localize the 
facial landmarks by Active Shape Model (ASM) |6|. ASM 
is composed by three parts: shape model, local experts and 
optimization strategy. 

In the most ASM variants, shape model is usually 
PCA 191, and we follow this model. For local experts, we 
use LBP fT] feature and Boosting classifier for each land- 
mark, which is similar with the method in lIZTl . Based on 
the output of Boosting classifiers, we can get a confidence 
map for each landmark. These confidence maps are feed to 
a Landmark Mean-Shift procedure |18|. Then we can get 
the final positions of all facial landmarks. For robustness 
and efficiency, the optimization process is repeated several 
times on two scales. 

The training set of our landmarker is constructed from 
the MUCT database [14J . Three views (a, d and e) with 
small pose variations are used for training. Because the 
background of images in the MUCT are almost uniform, 
we replace them with some random backgrounds and mir- 
ror all images to augment the dataset (see Figure |2]i. The 
uniform background of the face images are segmented by 
GrabCut 1 17 1, which is initialized by the results of face de- 
tection. Figure [T] shows two example images in the FERET 
database 1 16 1 and their 76 facial landmarks localized by the 
landmarker, from which we can see the landmarks are ro- 
bust to small pose variations and have good precision for 
the next steps. 

For geometry normalization, two eyes in the 76 land- 
marks are selected as reference points. Based on the ref- 
erence points, the face images are normalized to a standard 
frame with eye distance = 60 pixels by a similarity trans- 
formation. Note that the above similarity transformation is 
also applied on the 76 landmarks. And then DoG filter ||T91 
is used to reduce the influence of illumination on the face 
images. The parameters of DoG are: 7 = 0.2, ctq = 0.8, 
C7i ~ 1.6, dojiorm — 0. Figure |3] shows an aligned face 




Figure 2. Sample images in the MUCT training set for the ASM 
landmarker. Left: A color face image in the MUCT database, 
which has uniform background. Right: A face image is converted 
to gray scale and the background is replaced by a random image 
from the Internet. 




Figure 3. An aligned face image and its corresponding DoG fil- 
tered image. 

image and the DoG filtered image. 
3.2. Gabor Feature and LDA 

Given an aligned face image and the 76 landmarks, we 
extract local features on the landmarks by a Gabor wavelet, 
which is described in lITSl . 

Wy..a{^) = ^e^'\e'^--e-'^} (1) 

The wavelet is a plane wave with wave vector k, re- 
stricted by a Gaussian envelope, the size of which relative to 
the wavelength is parameterized by a. The second term in 
the brace removes the DC component. Following the pop- 
ular way, we sample the space of wave vectors k and scale 
a in a discrete hierarchy of 5 resolutions (differing by half- 
octaves) and 8 orientations at each resolution (See Figure]?]), 
thus giving 5 x 8 = 40 complex values for each landmark. 
Because the phase information is sensitive to image shift or 
mis-alignment, we just drop the phase and use the ampli- 
tude as feature for face recognition. 

Merging the feature values at all landmarks together, we 
get a feature vector with 76 x 40 = 3040 dimensions. To 
reduce the dimensionality of feature and remove the redun- 
dant information. Linear Discriminant Analysis (LDA) ||3] 
is used to learn a low dimensional subspace. In the LDA 
subspace, the similarity of feature vectors are evaluated by 
Cosine metric. 
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Figure 4. The real part of the Gabor wavelet in 5 resolutions and 8 
orientations. 
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In practice, we usually normalize the feature vector x and 
y to unit length as x' and y'. Then the Equation ^ can be 
written as 



.(x,y)=5(x',y')=x'V 



(3) 



4. Cascade Learning 



As described in Section [T] the objective of the cascade 
is to improve the speed of large scale face recognition and 
supply a flexible way to get a trade-off between the feature 
length and recognition rate. 

How to construct the cascade is feature and matching al- 
gorithm dependent. In this paper, we construct a specific 
cascade based on the Gabor-LDA feature and Cosine met- 
ric in the baseline method (see Section[3|. 

4.1. Structure 

The cascade has been widely used in face detection since 
the famous work of Viola and Jones \2\\. After that, there 
are many cascade variations arise for many specific applica- 
tions, such as Nested cascade 111], Soft cascade |5|, Boost- 
ing Chain |24| and so on. To use the existing features ef- 
ficiently, we adopt the nested structure. As shown in Fig- 
ure |5] we divide the feature template into several nested 
stages from coarse to fine. The first level "stage 1" is ac- 
tually the original feature template, which has the highest 
recognition rate. The "stage2" and "stage3" are coarse lev- 
els with lower recognition rate, and the feature template can 
be divided further 

When the structure of the cascade is determined, we can 
group the Gabor-LDA features in this way. In the experi- 
ments of this paper, the size of feature is 428, and the feature 
template is divided into a seven-stage structure. The sizes of 
the stages are: 6, 13, 26, 53, 107, 214, 428. For other appH- 
cations, the number of stage and the size of each stage can 
be chosen by experience or experiments. The recognition 
rate of each stage will be reported in the experiments. In 
resource constrained applications, we can drop some stages 
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Figure 5. The structure of the cascade classifier. 

at the end of feature template to save the storage and trans- 
mission overhead. 

4.2. Learning 

The target of the cascade classifier is to improve the 
feature matching speed. While training the cascade, the 
multi-class samples are firstly conveited to positive (sam- 
ple pairs from same subject) and negative samples (sam- 
ple pairs from different subject) by cross matching. Given 
two samples x' and y', the training sample is constructed by 
Equation Q as follows 



b = x'0y'. 



(4) 



where denote element-wise product. Given the nested 
structure of the cascade and the training samples, we need 
to learn the threshold of each stage to achieve some user 
defined Verification Rate (VR) (see Alg. [T]). To reduce the 
false reject rate of positive samples, we usually set the VR 
of each stage nearly to 100%. In this paper, we set the ver- 
ification rates for the seven stages as: 99.9%, 99.9%^ ~ 
99.8%, 99.9%^ = 99.7%, 99.9%^* = 99.6%, 99.9%^ = 
99.5%, 99.9%*' = 99.4%, 99.9%^ = 99.3%. 

Algorithm 1 Cascade learning procedure. 

1: Input: Positive samples {Ptj}, where ; = 1,2,- - 
i = 1,2, • • • ,n; n is sample count; d is feature dimen- 
sion. Note that {Pij} are constructed from the training 
set by Equation m is the size of each stage, v is the 
user defined VR of each stage. Stage count sn = 7. 

2: Output: Threshold t of each stage. 

3: Let cumulative similarity s = 0, / = 0, where the length 
of s is n. 

4: for = to in — 1 do 

5: while / < va[k] do 

6: s = s + P,H., where is the /th dimension of all 

samples; 

i = iH-l; 
end while 

Sorting s to find a threshold i[k] to let ^i^^ffll > 
y[k]. 
10: end for 

In the testing phase, two feature templates are matched 



from coarse to fine by Equation ([3]), i.e., from stage? to 
stage 1. If the similarity score is smaller than the threshold 
of current stage, the matching process is interrupted. Be- 
cause the probability of that the two feature templates com- 
ing from different subjects is very high, most of the match- 
ing pairs will be rejected by the early stages of the cascade. 
Therefore, the cascade can improve the matching speed sig- 
nificantly. The testing procedure is shown in the Alg. |2] 
Compared with ordinary linear scan way, the cascade only 
need two extra lines of code. This supplies a easy way to 
improve the speed of existing large scale face recognition 
system. 



Algorithm 2 Fast feature matching by the cascade classifier 
1: Input: Two normalized feature templates x' and y', fea- 
ture size d, feature size of each stage m, thresholds t, 
stage count sn. 
2: Output: Similarity s. 

3: Calculate the similarity in an incremental way: 

4: Let 5 = 0, / = 0; 

5: for ^ = to sn — 1 do 

6: while / < m[^] do 

7: s^s + x'[i]xy'[i\- 

8: i = i H- 1; 

9: end while 
10: if s < t[A:] then 
1 1 : break; 
12: end if 
13: end for 
14: return s. 



5. Experiments 
5.1. Database 

We use the FERET database to illustrate the advantages 
of our fast matching method following the standard test pro- 
tocols 1 16 |. The results of the four popular experiments are 
reported: fafb, fafc, fadupl and fadup2. In the experiments, 
fa is always used as gallery, and fb, fc, dupl and dup2 are 
used as probe respectively. The first two experiments fafb 
and fafc are already saturated by the most of state-of-the- 
art algorithms, but fadupl and fadup2 are still challenging 
due to the appearance variations caused by expression and 
aging. 

To simulate large scale face recognition applications, we 
collect a large database, which contains 200,000 face im- 
ages with frontal pose, uniform illumination and good qual- 
ity (These images can not be disclosed due to privacy is- 
sues). We use the self-collected database to enlarge the 
gallery of the FERET database to test the performance of 
the baseline method on the setting of big data. With a big 
gallery, we can observe the variation of recognition rate and 



Table 1. The baseline rank-1 recognition rate of the four experi- 
ments on the FERET database. 



Experiment 


fafb 


fafc 


fadupl 


fadup2 


Tan&Triggs |19| 


98.0% 


98.0% 


90.0% 


85.0% 


S-LGBPh-LGXP |25| 


99.0% 


99.0% 


94.0% 


93.0% 


G-LQP |20| 


99.9% 


100% 


93.2% 


91.0% 


Proposed Baseline 


99.67% 


100% 


93.35% 


92.74% 



speed-up ratio of the cascade more easily. Furthermore, we 
will analyze the relationship between the performance and 
feature size to supply a reference for feature cutting in re- 
source constrained applications. 

5.2. Baseline 

First, all face images in the training set, fa, fb, dupl and 
dup2 are processed by a face detector and the facial land- 
marker. We get 76 facial landmarks of each face image. 
Then, face images are normalize to a standard frame with 
eye distance — 60 pixels, and processed by DoG |19| filter 
to alleviate the affection of illumination. Note that, in the 
face normalization step, the 76 facial landmarks are trans- 
formed in the same way. 

At the transformed 76 facial landmarks, 40 x 76 = 3040 
Gabor features are extracted, and the dimension of Gabor 
feature is reduced to 428 (The number of subjects in the 
training set is 429) by LDA. Finally, the similarity is eval- 
uated by Cosine metric. The rank-1 recognition rate of the 
four experiments are shown in Table [T] From that, we can 
see the baseline method is comparable with many state-of- 
the-art methods lfT9l |25] l20l . 

While the other methods need to extract dense features 
from the face image, the proposed baseline only process the 
76 sparse landmarks. This will help to construct more effi- 
cient face recognition system in practical applications. Fur- 
thermore, the baseline method has a potential advantage that 
is pose robustness. No matter what the pose of face is, we 
always extract features at the 76 facial landmarks with fixed 
semantics. 

5.3. Large Scale Gallery 

To illustrate the performance in big data setting, we mix 
200,000 face images into the existing gallery as "Extended- 
fa", and the probe sets remain unchanged. Like fa set, the 
Extended-fa is also processed by the pipeline described in 
Section [3] Figure |6] shows the impact of the size of gallery 
to the recognition rate. We can see the last two experiments 
are more easily affected by the size of gallery than the first 
two. However, when the Extended-fa contains 200,000 + 
1196 images, the baseline still perform well. The rank-1 
recognition rate of the last two experiments drop about 4%, 
i.e., Extended-fadupl: from 93.35% to 89.89%, Extended- 
fadup2: from 92.74% to 89.32%. 
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Figure 6. The performance comparison of the baseline method be- 
fore and after the gallery expansion, where "Extended-" denote the 
gallery fa is extended by the extra 200, 000 face images. 



In the following, the performance of the cascade and par- 
tial cascade are also evaluated on the large extended gallery 
"Extended-fa". 

5.4. Cascade Evaluation 

The cascade is trained using the training set of the 
FERET and the settings described in Section]?] Its perfor- 
mance is assessed in two aspects: recognition rate loss, and 
computation speed up. 

All experiments are conducted on the extended FERET 
database. The experimental platform is a normal PC with 
2.4GHz CPU, 4G Memory. Table ]2] list the rank-1 recog- 
nition rate of normal and cascade based matching methods. 
It's amazing that the recognition rate of the cascade keeps 
as same as the original Cosine metric with zero performance 
loss. The 1th column of the table shows the average time to 
scan the gallery one time of all experiments, which indicate 
that the average matching speed is improved significantly 
by 7.55 times. 

5.5. Feature Length vs. Recognition Rate 

On one hand, the cascade provides a fast matching 
method for large scale face recognition systems. On the 
other hand, it also provides a flexible structure for the sys- 
tem to cut off feature length to save the storage and trans- 
mission bandwidth. Here, Figure ]7] gives the relationship 
between the recognition rate and feature length for refer- 
ence. 

We test the four experiments using partial cascade that 
means we just evaluate the similarity of two samples by 
the first k stages in the cascade and neglect the last sev- 
eral stages. The x-axis in Figure]?] is the count of stage we 
used. From the figure, we can see the speed of improve- 
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Figure 7. The relationship between the recognition rate and feature 
length on the extended FERET database. 



ment of recognition rate with respect to the stage count (or 
feature length) is decrease. That means, as the length of fea- 
ture increases, the recognition rate will become harder and 
harder to improve. Therefore, we can cut off a large part 
at the end of the feature to get a smaller feature template, 
while not sacrificing much recognition rate. For example, 
when the length of feature template is cut from 428 to 214 
dimension, i.e., use 6 stages, the recognition rate of exper- 
iment "fadup2" just drop 3.42%. This property supplies a 
large room for system-level optimization for some specific 
applications. 

6. Conclusions 

Computation speed, storage capacity, and transmission 
bandwidth are three key factors for large scale face recogni- 
tion systems. To improve the efficiency of system in terms 
of these factors, we propose a cascade classifier and its 
learning algorithm. Through extensive experiments on the 
FERET and a large self-collected database, we find the cas- 
cade can improve the face matching speed by 7.55 times 
with zero recognition rate loss. The cascade also supplied a 
flexible structure for resource constrained applications, by 
which we could drop half of features just with minor per- 
formance loss. Furthermore, the proposed baseline method 
(landmark based GaborH-LDA) is comparable to many state- 
of-the-art methods on the FERET database, which will be 
studied further in the future and applied in unconstrained 
face recognition problems, e.g., LEW. 
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Extended-fafb 


Extended-fafc 


Extended-f adup 1 


Extended-fadup2 


Total Time 


Time/Query 


Speed-up Ratio 


Normal 


99.58% 


98.97% 


89.89% 


89.32% 
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129.48ms 


1 


Cascade 
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89.32% 










21060ms 


3105ms 
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4103ms 
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17.16ms 


7.55 
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